The version of R is as listed below.

version
##                _                                
## platform       x86_64-w64-mingw32               
## arch           x86_64                           
## os             mingw32                          
## crt            ucrt                             
## system         x86_64, mingw32                  
## status                                          
## major          4                                
## minor          2.2                              
## year           2022                             
## month          10                               
## day            31                               
## svn rev        83211                            
## language       R                                
## version.string R version 4.2.2 (2022-10-31 ucrt)
## nickname       Innocent and Trusting

The version of Python is listed below.

import sys
sys.version
## '3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:42:03) [MSC v.1929 64 bit (AMD64)]'

Objective

The purpose of this Assignment is to recreate the base R code of the textbook in higher-level R and Python. There will be three different blocks of code that do the same thing: one in base R, one in fancy R, and one in Python. The original base R code is taken from https://hastie.su.domains/ISLR2/Labs/Rmarkdown_Notebooks/Ch2-statlearn-lab.html. All other code is my own.

Libraries

First we install the packages needed for this notebook (note that I already have done it so it is commented out). Then we load the libraries.

# install.packages("dplyr", "plotly", "htmlwidgets", "GGally")
library(dplyr) # Used for fancier R manipulation
library(plotly) # Used for fancier plots
library(htmlwidgets) # Used to save plotly plots
library(GGally) # Used for pairs plots

Next for Python we install the packages with pip. Again I already have done that in the terminal. You can run it right here with the % in front of it (But this is a little hacky).

# %pip install plotly
# %pip install pandas
# %pip install numpy

And we load them.

import numpy as np # For vectors, matrices etc.
import plotly.express as px # For plotting simple graphs
import plotly.graph_objects as go # For plotting more complex graphs
import pandas as pd # For data frames

Basic Commands

Vectors

The first block of code assigns a vector. Note that the older assignment in R, <-, is just = in Python:

# In R
x <- c(1, 3, 2, 5)
x
## [1] 1 3 2 5

In Python we use [] to create a list. In order to create a vector we need to use the linear algebra library numpy.

# In Python
x = np.array([1,3,2,5])
x
## array([1, 3, 2, 5])

Similarly for the next vector with just = in R.

# In R
x = c(1, 6, 2)
x
## [1] 1 6 2
# In Python
x = np.array([1, 6, 2])
x
## array([1, 6, 2])
# In R
y = c(1, 4, 3)
# In Python
y = np.array([1, 4, 3])

Next is the length() function which for numpy arrays is shape.

# In R
length(x)
## [1] 3
length(y)
## [1] 3

And Python.

# In Python
x.shape
## (3,)
y.shape
## (3,)

And the summation.

# In R
x + y
## [1]  2 10  5
# In Python
x + y
## array([ 2, 10,  5])

Variables

Next we look at the ls() function. In Python it can be done with the dir() function. Python gives a bit more info with some built-in things and packages.

# In R
ls()
## [1] "x" "y"
# In Python
dir()
## ['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'go', 'np', 'pd', 'px', 'r', 'sys', 'x', 'y']

Now we remove some variables.

# In R
rm(x, y)
ls()
## character(0)
# In Python
del x,y
dir()
## ['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'go', 'np', 'pd', 'px', 'r', 'sys']

And all objects at once.

# In R
rm(list = ls())

To delete only user-defined variables in Python we only want ones that don’t start with __.

# In Python
for obj in dir():
  if not obj.startswith("__"):
    del globals()[obj]
dir()
## ['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'obj']

Asking for help

Next we ask for help with the matrix function. (Uncomment if you choose; it opens a pop up.)

# In R
# ?matrix

# In Python
import numpy as np # Import again because we just removed it
# help(np.array)

Matrices (or arrays if you are very dimensional)

Now we build some matrices.

# In R
x <- matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2)
x
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

For Python it is a little bit different. We can pass a single vector but then we have to reshape it to be a matrix.

# In Python
x = np.array([1,2,3,4]).reshape(2,2)
x
## array([[1, 2],
##        [3, 4]])

If you notice Python reshaped it by filling rows first. To get the matrix we want we need to transpose it.

# In Python
x = np.array([1,2,3,4]).reshape(2,2) \
                       .transpose()
x
## array([[1, 3],
##        [2, 4]])

This can be done more easily in R with the byrow flag.

# In R
x = matrix(c(1, 2, 3, 4), 2, 2, byrow = TRUE)

Next we square root and square the matrices.

# In R
sqrt(x)
##          [,1]     [,2]
## [1,] 1.000000 1.414214
## [2,] 1.732051 2.000000
x^2
##      [,1] [,2]
## [1,]    1    4
## [2,]    9   16

And Python.

# In Python
np.sqrt(x)
## array([[1.        , 1.73205081],
##        [1.41421356, 2.        ]])
np.square(x)
## array([[ 1,  9],
##        [ 4, 16]])

Random numbers

Finally we generate some random numbers. In Python we need to specify \(\mu\) and \(\sigma\) whereas in R it is by default 0 and 1. Python produces a correlation matrix instead. Also the numbers are different due to different seeds.

# In R
x <- rnorm(50)
y <- x + rnorm(50, mean = 50, sd = .1)
cor(x, y)
## [1] 0.9949742
# In Python
x = np.random.normal(0, 1, 50)
y = x + np.random.normal(50, .1, 50)
np.corrcoef(x,y)
## array([[1.        , 0.99301508],
##        [0.99301508, 1.        ]])

We can set the seed to make reproducible code.

# In R
set.seed(3)
y <- rnorm(100)
mean(y)
## [1] 0.01103557
# In Python
np.random.seed(3)
y = np.random.normal(0, 1, 100)
np.mean(y)
## -0.10863707440606224

Finally we ask for some variances and standard deviations.

# In R
var(y)
## [1] 0.7328675
sqrt(var(y))
## [1] 0.8560768
sd(y)
## [1] 0.8560768

Repeat in Python.

# In Python
np.var(y)
## 1.132081888283007
np.sqrt(np.var(y))
## 1.0639933685333791
np.std(y)
## 1.0639933685333791

Graphics

plotly is a java-based interactive plotting library. It shares many similarities with ggplot in R. In Python there are a bit more nuances, mostly if you want it simple or complicated.

x <- rnorm(100)
y <- rnorm(100)
plot(x, y)

Next in fancy R.

# In Fancy R
data.frame(cbind(x,y)) |>
  plot_ly(x=x, y=y) |>
  add_markers() |>
  layout(title="Plot of Y vs X", xaxis=list(title="this is the x-axis"),
         yaxis=list(title="this is the y-axis"))

And Python

# In Python
import plotly.express as px # Again we got rid of it too with the delete all
fig = px.scatter(x=np.random.normal(0,1,100), y=np.random.normal(0,1,100),
                 title="Plot of X vs Y", labels=dict(x="this is the x-axis",
                                                     y="this is the y-axis"))
fig.show()

Now to save it.

# In R
library(htmlwidgets) # Back again

p <- data.frame(cbind(x,y)) |>
  plot_ly(x=x, y=y) |>
  add_markers() |>
  layout(title="Plot of X vs Y", xaxis=list(title="this is the x-axis"),
         yaxis=list(title="this is the y-axis"))

saveWidget(p, file="scatter.html")

And Python.

# In Python
fig = px.scatter(x=np.random.normal(0,1,100), y=np.random.normal(0,1,100),
                 title="Plot of X vs Y", labels=dict(x="this is the x-axis",
                                                     y="this is the y-axis"))

fig.write_html("scatter2.html")

Sequences

Next we generate some sequences in R and Python using the range() function. Don’t forget Python indexes from 0. We also have to use the np.linspace() function in Python.

# In R
x <- seq(1, 10)
x <- 1:10
x <- seq(-pi, pi, length = 50)

Python:

# In Python
x = range(1,11)
x = np.linspace(-np.pi, np.pi, 50)

Now that we have a domain, we can define a function and plot it as a contour map as well as a surface.

# In R
y <- x
f <- outer(x, y, function(x, y) cos(y) / (1 + x^2))
contour(x, y, f)
contour(x, y, f, nlevels = 45, add = T)

I chose a different number of levels due to data visualization standards. The base R way is just too cluttered.

# In R
as.data.frame(cbind(x,y,f)) |>
  plot_ly(x=x, y=y) |>
    add_contour(z=matrix(f, nrow = length(y), byrow = TRUE),
                contours = list(
                    start=-.8,
                    end=.8,
                    size=.1,
                    showlabels = TRUE,
                    coloring="lines")) |>
    layout(title="Contour plot", xaxis=list(title="x"), yaxis=list(title="y"))

For Python things are similar but we use the meshgrid function.

# In Python
y = x
xr,yr = np.meshgrid(x,y) # A way of making a continuous domain of R^2
f = np.cos(yr) / (1 + xr**2)
# In Python
import plotly.graph_objects as go # We really shouldn't have deleted everything ...

fig = go.Figure() \
        .add_contour(x=x,y=y,z=f, contours=dict(
            start=-.8,
            end=.8,
            size=.1,
            showlabels=True,
            coloring="lines")) \
        .update_layout(title="Contour plot") \
        .update_xaxes(title="x") \
        .update_yaxes(title="y")
fig.show()

Another surface.

# In R
fa <- (f - t(f)) / 2

contour(x, y, fa, nlevels = 15)

# In R
as.data.frame(cbind(x,y,fa)) |>
  plot_ly(x=x, y=y) |>
    add_contour(z=matrix(fa, nrow = length(y), byrow = TRUE),
                contours = list(
                    start=-.8,
                    end=.8,
                    size=.1,
                    showlabels = TRUE,
                    coloring="lines")) |>
    layout(title="Contour plot 2", xaxis=list(title="x"), yaxis=list(title="y"))

And Python.

# In Python
fa = (f - f.T) / 2

fig = go.Figure() \
        .add_contour(x=x,y=y,z=fa, contours=dict(
            start=-.8,
            end=.8,
            size=.1,
            showlabels=True,
            coloring="lines")) \
        .update_layout(title="Contour plot") \
        .update_xaxes(title="x") \
        .update_yaxes(title="y")

fig.show()

Let’s fill things in.

# In R
as.data.frame(cbind(x,y,fa)) |>
  plot_ly(x=x, y=y) |>
    add_contour(z=matrix(fa, nrow = length(y), byrow = TRUE),
                contours = list(
                    start=-.8,
                    end=.8,
                    size=.1,
                    showlabels = TRUE,
                    labelfont=list(color="black"))) |>
    layout(title="Image plot", xaxis=list(title="x"), yaxis=list(title="y"))

# In python
fig = go.Figure() \
        .add_contour(x=x,y=y,z=fa, contours=dict(
            start=-.8,
            end=.8,
            size=.1,
            showlabels=True,
            labelfont=dict(color="black"))) \
        .update_layout(title="Image plot") \
        .update_xaxes(title="x") \
        .update_yaxes(title="y")
fig.show()

Going 3D and adding some perspective.

# In R
as.data.frame(cbind(x,y,fa)) |>
  plot_ly(x=x, y=y) |>
    add_surface(z=matrix(fa, nrow = length(y), byrow = TRUE)) |>
    layout(title="Perspective plot", xaxis=list(title="x"), yaxis=list(title="y"))
# In python
fig = go.Figure() \
        .add_surface(x=x,y=y,z=fa) \
        .update_layout(title="Perspective plot") \
        .update_xaxes(title="x") \
        .update_yaxes(title="y")

fig.show()

Indexing Data

Now we slice and dice the arrays. Depending on who you ask, remember what you consider to start the natural numbers. First define A.

# In R
A <- matrix(1:16, 4, 4)
A
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16

Python:

# In Python
A = np.array(range(1,17)).reshape(4,4).transpose()
A
## array([[ 1,  5,  9, 13],
##        [ 2,  6, 10, 14],
##        [ 3,  7, 11, 15],
##        [ 4,  8, 12, 16]])

Slice away.

# In R
A[2, 3]
## [1] 10
# In Python
A[1,2]
## 10
# In R
A[c(1, 3), c(2, 4)]
##      [,1] [,2]
## [1,]    5   13
## [2,]    7   15

Not so easy in Python. Took a while to find the ix_ function.

# In Python
A[(np.ix_([0,2], [1,3]))]
## array([[ 5, 13],
##        [ 7, 15]])
# In R
A[1:3, 2:4]
##      [,1] [,2] [,3]
## [1,]    5    9   13
## [2,]    6   10   14
## [3,]    7   11   15
# In Python
A[:3, 1:4]
## array([[ 5,  9, 13],
##        [ 6, 10, 14],
##        [ 7, 11, 15]])
# In R
A[1:2, ]
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
# In Python
A[:2,:]
## array([[ 1,  5,  9, 13],
##        [ 2,  6, 10, 14]])
# In R
A[, 1:2]
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
# In Python
A[:, :2]
## array([[1, 5],
##        [2, 6],
##        [3, 7],
##        [4, 8]])
# In R
A[1, ]
## [1]  1  5  9 13
# In Python
A[0,:]
## array([ 1,  5,  9, 13])
# In R
A[-c(1, 3), ]
##      [,1] [,2] [,3] [,4]
## [1,]    2    6   10   14
## [2,]    4    8   12   16
# In Python
np.delete(A, [0,2], 0)
## array([[ 2,  6, 10, 14],
##        [ 4,  8, 12, 16]])
# In R
A[-c(1, 3), -c(1,3,4)]
## [1] 6 8
# In Python
np.delete(np.delete(A, [0,2], 0), [0,2,3], 1).flatten()
## array([6, 8])
# In R
dim(A)
## [1] 4 4
# In Python
A.shape
## (4, 4)

Loading Data

Now let’s get to some data. We read in a data set from the website https://hastie.su.domains/ISLR2/Labs/. It is about cars. Read in and take a look.

# In R
Auto <- read.csv("Auto.csv", header = T, na.strings = "?", stringsAsFactors = T)
View(Auto)
head(Auto)
##   mpg cylinders displacement horsepower weight acceleration year origin
## 1  18         8          307        130   3504         12.0   70      1
## 2  15         8          350        165   3693         11.5   70      1
## 3  18         8          318        150   3436         11.0   70      1
## 4  16         8          304        150   3433         12.0   70      1
## 5  17         8          302        140   3449         10.5   70      1
## 6  15         8          429        198   4341         10.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst
## 5               ford torino
## 6          ford galaxie 500
dim(Auto)
## [1] 397   9
# In Python
import pandas as pd

Auto = pd.read_csv("Auto.csv")
Auto.head()
##     mpg  cylinders  displacement  ... year  origin                       name
## 0  18.0          8         307.0  ...   70       1  chevrolet chevelle malibu
## 1  15.0          8         350.0  ...   70       1          buick skylark 320
## 2  18.0          8         318.0  ...   70       1         plymouth satellite
## 3  16.0          8         304.0  ...   70       1              amc rebel sst
## 4  17.0          8         302.0  ...   70       1                ford torino
## 
## [5 rows x 9 columns]
Auto.shape
## (397, 9)
# In R
Auto[1:4, ]
##   mpg cylinders displacement horsepower weight acceleration year origin
## 1  18         8          307        130   3504         12.0   70      1
## 2  15         8          350        165   3693         11.5   70      1
## 3  18         8          318        150   3436         11.0   70      1
## 4  16         8          304        150   3433         12.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst

Now dplyr comes into the picture.

# Fancy R
Auto |> slice(1:4)
##   mpg cylinders displacement horsepower weight acceleration year origin
## 1  18         8          307        130   3504         12.0   70      1
## 2  15         8          350        165   3693         11.5   70      1
## 3  18         8          318        150   3436         11.0   70      1
## 4  16         8          304        150   3433         12.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst
# In Python
Auto.iloc[:4,:]
##     mpg  cylinders  displacement  ... year  origin                       name
## 0  18.0          8         307.0  ...   70       1  chevrolet chevelle malibu
## 1  15.0          8         350.0  ...   70       1          buick skylark 320
## 2  18.0          8         318.0  ...   70       1         plymouth satellite
## 3  16.0          8         304.0  ...   70       1              amc rebel sst
## 
## [4 rows x 9 columns]
# In R
Auto <- na.omit(Auto)
dim(Auto)
## [1] 392   9
# Fancy R
Auto |> na.omit() |>
        dim()
## [1] 392   9
# In Python
Auto.dropna().shape
## (397, 9)
# In R
names(Auto)
## [1] "mpg"          "cylinders"    "displacement" "horsepower"   "weight"      
## [6] "acceleration" "year"         "origin"       "name"
# Fancy R
Auto |> names()
## [1] "mpg"          "cylinders"    "displacement" "horsepower"   "weight"      
## [6] "acceleration" "year"         "origin"       "name"
# In Python
Auto.columns
## Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
##        'acceleration', 'year', 'origin', 'name'],
##       dtype='object')

Additional Graphical and Numerical Summaries

Now let’s do some more plotting.

# In R
plot(Auto$cylinders, Auto$mpg)

# Fancy R
Auto |> plot_ly(x=~cylinders, y=~mpg) |>
        add_markers() |>
        layout(title="Number of Cylinders vs Miles per Gallon",
               xaxis=list(title="Cylinders"),
               yaxis=list(title="MPG"))
# In Python

fig = px.scatter(Auto, x="cylinders", y="mpg",
                 title="Number of Cylinders vs Miles per Gallon",
                 labels=dict(cylinders="Cylinders",
                 mpg="MPG"))

fig.show()

Box plots

You can attach() things if you are so inclined that just makes the variable names available in R. The plot remains the same. The document from the textbook website plots many box plots changing one feature at a time. I have put them all together so you don’t see so many. Note that plotly does not have the varwidth option. Instead I added jitter points so you can see how many observations there are.

# In R
attach(Auto)
## The following object is masked from package:ggplot2:
## 
##     mpg
cylinders <- as.factor(cylinders)
plot(cylinders, mpg, col = "red", varwidth = T)

# Fancy R
Auto |> mutate(cylinders = as.factor(cylinders)) |>
        plot_ly(x=~mpg, y=~cylinders, color="red") |>
        add_boxplot(line=list(color="red"),marker=list(color="red"), boxpoints="all", jitter=.3 ) |>
        layout(title="Number of Cylinders vs Miles per Gallon",
               xaxis=list(title="MPG"),
               yaxis=list(title="Cylinders"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# In Python

Auto['cylinders'] = Auto['cylinders'].astype("object")

fig = px.box(Auto, x="mpg", y="cylinders", points="all",
             title="Number of Cylinders vs Miles per Gallon",
             labels=dict(cylinders="Cylinders", mpg="MPG"))

fig.show()

Histograms

And now for some histograms. Again they plot many of them; I will just plot one version with all the features in it.

# In R
hist(mpg, col = 2, breaks = 15)

# Fancy R
Auto |> plot_ly(x=mpg) |>
        add_histogram(nbins=15, color="red", stroke=list(color="black")) |>
        layout(title="Histogram of MPG",
               xaxis=list(title="MPG"),
               yaxis=list(title="Frequency"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning: 'histogram' objects don't have these attributes: 'nbins'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'autobinx', 'autobiny', 'bingroup', 'cliponaxis', 'constraintext', 'cumulative', 'customdata', 'customdatasrc', 'error_x', 'error_y', 'histfunc', 'histnorm', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'nbinsx', 'nbinsy', 'offsetgroup', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textsrc', 'texttemplate', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'x', 'xaxis', 'xbins', 'xcalendar', 'xhoverformat', 'xsrc', 'y', 'yaxis', 'ybins', 'ycalendar', 'yhoverformat', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
# In Python
fig = px.histogram(Auto, x="mpg",
             title="Number of Cylinders vs Miles per Gallon",
             labels=dict(mpg="MPG"))

fig.show()

Pairs plot

Finally the pairs plot. Let’s not plot them all like they do in the lab because it is too much and overwhelming.

# In R
pairs(
    ~ mpg + displacement + horsepower + weight + acceleration,
    data = Auto

  )

And now for an upgrade.

# Fancy R
(ggpairs(Auto, columns = c(1,3:6), title="Pairs Plot")) |> ggplotly()
## Warning: Can only have one: highlight

## Warning: Can only have one: highlight

## Warning: Can only have one: highlight

## Warning: Can only have one: highlight
# In Python
fig = px.scatter_matrix(Auto[["mpg","displacement","horsepower","weight", "acceleration"]],
                        title="Pairs Plot", width=1000, height=1000)
## C:\Users\COLINJ~1\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\plotly\express\_core.py:279: FutureWarning:
## 
## iteritems is deprecated and will be removed in a future version. Use .items instead.
fig.show()

Summaries

For the last part we make summaries of the data frames and variables. For the whole frame,

# In R
summary(Auto)
##       mpg          cylinders      displacement     horsepower        weight    
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                                
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365
# Fancy R
Auto |> summary()
##       mpg          cylinders      displacement     horsepower        weight    
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                                
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365
# In Python
Auto.describe()
##               mpg  displacement  ...        year      origin
## count  397.000000    397.000000  ...  397.000000  397.000000
## mean    23.515869    193.532746  ...   75.994962    1.574307
## std      7.825804    104.379583  ...    3.690005    0.802549
## min      9.000000     68.000000  ...   70.000000    1.000000
## 25%     17.500000    104.000000  ...   73.000000    1.000000
## 50%     23.000000    146.000000  ...   76.000000    1.000000
## 75%     29.000000    262.000000  ...   79.000000    2.000000
## max     46.600000    455.000000  ...   82.000000    3.000000
## 
## [8 rows x 6 columns]

And with one variable.

# In R
summary(mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   17.00   22.75   23.45   29.00   46.60
# Fancy R
Auto |> select(mpg) |>
        summary()
##       mpg       
##  Min.   : 9.00  
##  1st Qu.:17.00  
##  Median :22.75  
##  Mean   :23.45  
##  3rd Qu.:29.00  
##  Max.   :46.60
# In Python
Auto["mpg"].describe()
## count    397.000000
## mean      23.515869
## std        7.825804
## min        9.000000
## 25%       17.500000
## 50%       23.000000
## 75%       29.000000
## max       46.600000
## Name: mpg, dtype: float64